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^ Three  experiments  investigated  the  effect  of  an  interfering  white  noise 
on  the  recognition  of  brief-duration  complex  sounds.  Listeners  were 
presented  with  a 20  msec  signal  followed,  after  a variable  delay,  by  a 
500  msec  white  noise  burst.  Their  task  was  to  classify  the  signal  into 
one  of  two  categories  on  the  basis  of  either  its  fundamental  frequency, 
waveform  or  formant  frequency.  The  main  focus  of  the  experiments  was 
to  investigate  the  relation  between  performance  and  the  auditory  features 
or  cues  present  in  the  signal.  Recognition  performance  improved  with 
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20.  Abstract  continued 


-'inr  ..sing  inter-stimulus  intervals  up  to  an  asymptote  at  approximately 
200  msec.  This  finding  is  consistent  with  earlier  results  in  suggest- 
ing that  brief-duration  signals  are  retained  for  a short  time  in  a 
precategorical  sensory  memory  for  further  processing.  In  addition, 
the  data  revealed  that  asymptotic  performance  level  was  determined 
primarily  by  the  distinctiveness  or  discriminability  of  the  relevant 
auditory  feature  and  by  the  amount  of  listener  experience  with  the 
relevant  feature.  It  was  concluded  that  practiced  listeners  have 
an  improved  ability  to  selectively  focus  their  attention  on  specific 
auditory  cues  in  a complex  aural  display.  — .. 
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Rarely  do  we  take  second  notice  of  the  multitude  of  meaningful  non-speech 

[ 

sounds  we  are  exposed  to  daily.  As  with  other  modalities,  the  perception  of 
I meaningful  auditory  stimuli  requires  little  effort  and  continues  rather  "auto- 

matically." Although  we  are  normally  not  aware  of  the  ongoing  processes 
responsible  for  auditory  perception,  much  empirical  evidence  indicates  that 
f substantial  perceptual  processing  is  required  to  recognize  even  the  simplest 

auditory  signal  (e.g.,  Wightman  & Green,  1974). 

Although  we  can  readily  identify  a variety  of  sounds,  there  are  large 
differences  in  the  relative  difficulty  with  which  particular  sounds  may  be 
classified.  The  reasons  for  this  can  be  revealed  through  an  analysis  of  the 
psychophysical  structure  of  the  sounds  and  of  the  perceptual  processes  required 
to  experience  them.  For  instance,  when  classifying  automobile  and  jet-turbine 
sounds,  the  differences  between  the  physical  structure  of  the  two  categories 
might  be  more  apparent  than  the  physical  distinctions  among  different  sounds 
within  each  set.  These  physical  distinctions  influence  the  perceptual  processes 
applied  to  the  sounds,  leading  to  differences  in  perceptual  complexity.  In 
general , the  greater  the  number  of  physical  components  that  various  sounds  have 
in  common,  the  more  difficult  it  will  be  to  distinguish  one  from  another 
psychological ly. 

The  psychological  literature  indicates  a tendency  for  listeners  to  exhibit 
selective  sensitivity  to  particular  auditory  dimensions  or  features.  The 
present  report  describes  three  experiments  that  further  investigate  this 
phenomenon  by  requiring  listeners  to  identify  the  perceived  quality  (high  pitch 
or  low  pitch,  for  instance)  of  one  of  three  acoustic  dimensions  (funda- 
mental frequency,  waveform,  or  formant  frequency)  in  brief-duration  complex 
auditory  signals.  A complete  description  of  these  dimensions  will  be 
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presented  below. 

The  analysis  of  auditory  information  processing,  as  visual  processing, 
has  typically  followed  a classical  multi-stage  model  (cf.,  Lindsay  & Norman, 

1972;  Neisser,  1967).  Although  the  details  of  such  models  vary  from  author  to 
author,  in  general,  four  major  stages  are  identified:  detection,  feature  extrac- 
tion, recognition,  and  response.  Each  stage  or  sublevel  represents  a series 
of  mental  processes  which  share  common  characteristics.  According  to  this 
model,  in  the  first  or  detection  stage  the  stimulus  is  encoded  or  converted  to 
an  internal  representation.  Frequently,  this  stage  is  associated  with  a sen- 
sory store  or  memory  where  information  in  its  literal  form  is  held  briefly 
for  later  processing.  The  second  or  feature  extraction  stage,  is  considered  to 
be  the  level  at  which  the  raw  representation  of  the  stimulus  is  broken  down 
into  its  major  features.  This  process  is  referred  to  as  feature  extraction. 

A square  wave,  for  instance,  is  analyzed  in  terms  of  its  loudness  and  pitch 
which  in  turn  may  be  based  on  the  relations  among  various  harmonics. 

The  third  level  of  processing,  or  the  recognition  stage,  is  of  special 
relevance  to  memory  since  it  is  at  this  stage  that  the  interaction  of  raw  stimuli 
and  higher  cognitive  processes  takes  place.  For  instance,  an  incoming  pattern 
might  be  compared  with  previously  stored  patterns  according  to  a particular  rule. 
The  result  of  this  comparison  is  the  naming  or  categorization  of  the  original 
pattern.  The  fourth  and  last  stage,  the  response  stage,  involves  a determination 
of  the  output.  This  final  stage  is  of  little  interest  to  the  present  report 
and  will  not  be  discussed  further. 

From  this  general  model  it  is  apparent  that  auditory  perception  depends 
on  the  ability  of  a listener  to  extract  featural  information  from  the  stimuli, 
transform  this  information  into  a meaningful  form  by  referencing  stored  infor- 
mation, and  finally  to  initiate  some  sort  of  response.  Furthermore,  it  is 
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obvious  that  the  psychological  mechanism  underlying  pattern  recognition  is  a 
dynamic  one  involving  the  interrelation  of  all  four  levels  of  processing. 
Distinction  Features  of  Auditory  Stimul i 

For  the  most  part,  the  major  emphasis  in  past  studies  of  auditory  per- 
ception has  been  on  the  perception  of  the  major  parameters  of  relatively  simple 
acoustic  waveforms.  These  include  such  elements  as  frequency,  amplitude,  phase, 
and  harmonic  content.  Such  studies  count  into  the  hundreds  and  have,  conse- 
quently, provided  psychoacousticians  with  a rich  background  of  material  for 
the  analysis  of  more  complex  auditory  patterns.  Although  it  is  clearly  important 
to  understand  how  such  basic  acoustic  features  are  perceived,  it  is  also  important 
to  make  a distinction  between  simple  auditory  features  such  as  pitch  (frequency), 
loudness  (amplitude)  and  duration,  and  more  complex  features  such  as  tonal 
complexity.  Gibson  (1966)  especially  stressed  this  distinction:  "Instead 
of  simple  pitch,  the  (auditory  stimuli)  vary  in  timbre  or  tone  quality,  in 
combinations  of  tone  quality,  in  vowel  quality,  in  approximations  to  noise,  in 
noise  quality  and  in  changes  of  all  these  in  time  . . . ."  It  is  obvious 
that  the  relations  among  features  of  naturally  occurring  auditory  events  are 
critical  to  their  perception,  and  a more  complete  statement  of  the  auditory 
recognition  process  can  not  be  made  without  a consideration  of  the  perceptual 
properties  of  these  more  complex  physical  events. 

Several  methods  have  been  used  to  examine  the  relative  importance  of 
complex  auditory  features.  One  successful  method  of  investigating  the  degree 
of  feature  importance  or  saliency  involves  the  application  of  a scaling  technique 
such  as  the  INDSCAL  multidimensional  scaling  (MDS)  method  (Carroll  & Chang,  1970). 
Such  a method  has  been  employed  for  speech  sounds  (Shepard,  1972),  musical  sounds 
(Miller  & Carterette,  1975;  Grey,  1977),  and  most  recently,  was  employed  in  the 
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analysis  of  sonar  (Howard,  1977)  and  sonar-like  sounds  (Howard  & Silverman,  1976). 
In  this  latter  study,  the  MDS  technique  was  used  to  derive  a relation  between 
certain  physical  characteristics  of  complex  non-speech  sounds  and  their  per- 
ceptual correlates.  The  MDS  technique  was  found  to  be  a successful  method  of 
determining  which  of  the  physical  characteristics  of  a complex  sound  are  per- 
ceptually important. 

For  the  16  sounds  investigated  by  Howard  and  Silverman,  three  auditory 
features  were  found  to  differ  in  featural  saliency:  fundamental  frequency, 
waveform,  and  the  number  and  center  frequency  of  formants.  That  is,  in  judg- 
ing the  similarity  or  difference  between  complex  tones  comprised  of  these 
features,  listeners  tended  to  use  the  fundamental  frequency  to  a greater  degree 
than  waveform  and  formant  frequency.  The  present  study  investigates  the  per- 
ception of  these  three  features  in  brief  duration  signals. 

Feature  Extraction  in  Brief-Durati on  Audi tory  Si qnals 

Earlier  worx  has  shown  that  under  conditions  where  an  auditory  stimulus  is 
presented  for  only  a brief  duration,  identification  of  the  signal  will  require 
processing  time  that  exceeds  the  duration  of  the  stimulus  itself  (e.g., 

Massaro,  1975;  Sparks,  1976).  Massaro  (1975)  has  argued  that  a preperceptual 
auditory  image  or  memory  persists  beyond  the  stimulus  presentation  which  "... 
contains  the  necessary  information  for  perceptual  processing."  The  listener's 
auditory  recognition  processes  are  applied  to  the  preperceptual  image, 
extracting  information  about  the  auditory  features  present  in  the  memory. 

Two  experimental  procedures  have  been  used  to  examine  memory  for  non-speech 
sounds:  a paired  comparison  task  and  a recognition  interference  task. 

In  paired-comparison  tasks,  a standard  and  a comparison  tone  are  compared. 
Typically,  an  increase  in  the  inter-stimulus  interval  ( I SI ) between  the  tones 


Recognition  Interference 


5. 

results  in  a decline  in  performance  (Harris,  1952;  Bachem,  1954).  This  has 
been  attributed  to  a decay  of  the  auditory  image  of  the  standard  tone.  Linder 
conditions  of  variable  intervals  between  the  standard  and  comparison  tone, 
performance  can  be  degraded  by  events  occurring  between  the  standard  and  com- 
parison tones.  Wickelgren  (1966)  examined  this  effect  by  varying  not  only 
tone  and  ISI  duration,  but  the  duration  of  an  interpolated  interfering  tone 
as  well.  In  general,  he  found  that  a)  the  longer  the  duration  of  an  inter- 
fering tone,  the  poorer  the  memory  for  the  pitch  of  the  standard,  b)  increasing 
the  duration  of  the  standard  tone  from  two  to  eight  seconds  facilitated  the 
memory  for  pitch  and  c)  there  were  substantial  differences  in  listener's  per- 
formance with  different  types  of  auditory  stimuli.  Especially  important  to 
the  present  study  is  Wickelgren 's  suggestion  that  recognition  performance 
is  not  only  related  to  the  duration  of  the  stimulus  and  interference  tones 
themselves  but  also  depends  on  the  interfering  material.  These  findings  have 
been  supported  in  recent  studies  by  Watson,  Wroton,  Kelly  and  B^nbassat  (1975) 
and  Sparks  (1976) . 

Deutsch,  in  a series  of  studies  (1972a,  1972b),  further  examined 

this  interactive  effect  for  pitch  memory.  In  general,  her  results  indicated 
that  memory  for  the  standard  and  comparison  tones  can  be  disrupted  or  enhanced 
depending  on  the  pitch  of  intervening  tones.  The  two  tones  presented  were  to 
be  compared  for  pitch.  Six  tones  were  presented  within  a five  second  interval 
between  the  standard  and  comparison  tones.  When  the  second  tone  of  the  sequence 
was  identical  to  the  standard,  memory  of  the  standard  was  facilitated.  As  the 
second  interpolated  tone  changed  in  pitch  (in  increasing  5 Hz  steps)  memory 
for  the  standard  decreased.  In  general,  identification  performance  in  a paired- 
comparison  task  will  be  influenced  by  the  duration  of  the  stimuli,  the  separa- 
tion in  time  between  stimuli,  and  by  auditory  events  occurring  between  the  tones. 
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In  recognition  interference  studies  a tone  is  presented  to  an  observer 
for  a few  milliseconds.  Under  these  conditions,  a click-like  sound  will  be 
heard.  If  the  same  tone  is  presented  for  a longer  duration  (greater  than 
10  msec),  the  tone  will  begin  to  acquire  a pitch  quality.  In  general,  when 
asked  to  report  the  pitch  of  the  tone,  an  observer's  accuracy  increases  as 
the  duration  of  the  tone  increases  (Massaro,  1975).  For  longer  tones  more 
information  is  available  for  determining  the  pitch  of  the  stimulus.  In 
addition,  if  an  observer  is  asked  to  report  the  pitch  of  a constant  duration 
tone  followed  after  a variable  interval  by  a masking  sound,  accuracy  will 
increase  as  the  time  between  tone  offset  and  noise  onset  increases. 

Massaro  has  been  particularly  interested  in  the  nature  of  the  auditory 
image  created  by  a tone  suggests  that  a "temporal  unit  of  an  auditory 

stimulus  is  stored  v ^ceptual  auditory  store"  for  later  processing. 

This  is  very  much  visual  image  that  persists  after  the  offset  of  a 

visual  stimulus  (cf.,  Sperling,  1960;  Averbach  & Coriell,  1961;  Neisser,  1967). 
In  both  modalities,  perceptual  processing  depends  on  a preperceptual  image 
where  information  is  held  for  subsequent  processing.  Most  important  is  the 
fact  that  the  readout  process  takes  a certain  amount  of  time  emphasizing 
"the  temporal  course  of  perceptual  processing."  Massaro’s  basic  argument  for 
a preperceptual  store  is  that  features  (of  an  auditory  image)  cannot  be  recog- 
nized as  they  arrive  since  this  requires  that  perception  be  immediate"  (1972). 
He  provides  strong  evidence  for  the  existence  of  an  image  that  outlasts  a 
stimulus  by  manipulating  the  events  following  stimulus  offset. 

Observers  in  one  of  Massaro's  experiments  were  trained  to  identify  the 


r 


particular  quality  of  the  stimuli.  For  instance,  a tone  might  be  of  a "sharp" 
nature  (square  wave)  or  have  a "dull"  quality  (sine);  its  pitch  might  be  high 
(high  frequency)  or  low  (low  frequency).  Massaro  reported  that  during  testing, 
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recognition  performance  improved  with  increases  in  the  silent  ISI  up  to  250  msec. 
Recognition  performance  did  not  improve  beyond  this  interval.  In  general,  the 
poor  performance  after  short  ISIs  was  thought  to  reflect  the  termination  of 
perceptual  processing  of  the  auditory  image  by  the  masking  tone.  Massaro  and 
others  (Elliot,  1967;  Homick,  Elfner,  & Bothe,  1969;  Efron,  1970)  have  examined 
these  results  further  in  forward  masking  designs  as  well  as  under  conditions 
requiring  the  listener  to  estimate  the  duration  of  a test  tone  (since  short 
tones  produce  an  auditory  image,  listeners  tend  to  overestimate  their  duration). 
Summary 

To  summarize,  the  previous  discussion  has  emphasized  three  major  aspects 
of  auditory  information  processing.  First,  the  recognition  of  all  auditory 
patterns,  whether  speech  or  non-speech,  is  based  on  their  important  auditory 
features.  This  is  true  for  simple  stimuli,  such  as  pure  tones,  as  well  as 
for  more  complex  patterns.  Second,  since  time  is  required  to  process  this 
information,  the  sounds  must  be  maintained  in  some  form  of  preperceptual 
memory.  Third,  it  is  convenient  to  consider  auditory  perception  as  the  result 
of  an  ordered  flow  of  information  through  specialized  elements  of  a processing 
system.  Finally,  it  is  important  to  consider  the  processing  of  any  auditory 
event  as  the  product  of  a dynamic  interdependence  among  a variety  of  processes. 

At  present,  little  research  has  considered  how  the  recognition  of  a 
complex,  brief-duration,  non-speech  sound  depends  on  the  particular  dimensions 
or  features  present  in  that  signal.  Much  of  the  evidence  presented  thus  far 
suggests  that  listeners  are  differentially  sensitive  to  particular  acoustic 
information.  Sensitivity,  in  the  present  usage,  refers  to  the  ability  of  a 
listener  to  classify  a tone  presented  for  a very  short  duration  and  followed  by 
an  interfering  burst  of  white  noise.  Those  auditory  features  that  are  more 
important  or  salient  should  be  less  effected  by  the  white  noise  (regardless 
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of  its  temporal  proximity)  than  those  that  are  less  salient. 

The  present  study  is  designed  to  investigate  the  relative  importance  of 
different  auditory  features  in  a procedure  similar  to  that  of  Massaro.  How- 
ever, unlike  Massaro,  the  present  research  focuses  primarily  on  the  relation  of 
performance  to  the  featural  composition  of  the  auditory  stimuli.  More  speci- 
fically, the  primary  question  asked  is:  if  a particular  signal  is  composed 
of  some  combination  of  two  possible  frequencies,  waveforms  (e.g.,  triangle  or 
square),  and  has  different  concentrations  of  acoustic  energy  (formants),  which 
of  these  features  “stand  out"  or  lend  themselves  to  clearer  identification  over 
others?  If  the  listener  is  allowed  to  extract  the  important  information  from 
a stimulus  under  conditions  where  the  amount  of  time  allowed  for  this  process 
is  constrained,  the  relative  perceptual  importance  of  particular  features  should 
be  reflected  in  the  overall  performance.  The  recognition  interference  task 
employed  in  the  present  study  will  not  only  provide  information  about  the 
relative  importance  of  specific  auditory  features,  but  will  also  permit  inferences 
about  the  temporal  course  of  the  feature  extraction  process. 

EXPERIMENT  1 

Experiment  1 examines  the  effect  of  a masking  noise  on  the  classification 
of  brief-duration  auditory  signals  varying  in  either  fundamental  frequency, 
waveform  or  formant  frequency.  Each  observer  was  asked  to  classify  the  signals 
into  one  of  two  categories  (e.g.,  high  or  low)  on  the  basis  of  each  of  the  three 
features  on  different  days.  Since  performance  data  are  available  for  each 
observer  on  each  feature,  classification  performance  can  be  compared  for  dif- 
ference features. 

I . Method 


A.  Observers 

The  listeners  were  four  female  and  four  male  students  (aged  U-29  years). 
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All  but  one  of  the  students  attended  The  Catholic  University  of  America.  All 
reported  having  normal  hearing;  four  listeners  were  randomly  chosen  and  admini- 
stered audiograms.  Each  observer  was  paid  $18.00  to  participate  in  three  two- 
hour  sessi ons . 

B.  Stimuli 

As  in  the  Howard  and  Silverman  (1976)  study,  the  stimuli  were  ger.^ated 
by  driving  a laboratory-constructed  formant  filter  (Graeme,  1971)  with  a square 
or  triangular  wave  at  either  90  Hz  or  140  Hz.  The  center  frequency  of  the 
formant  filter  had  a maximum  amplitude  10  dB  greater  than  the  filter's  response 
at  100  Hz.  The  stimuli  were  recorded  continuously  on  magnetic  tape  for  later 
playback  during  the  experiment.  Table  1 displays  an  outline  of  the  three 
auditory  features  employed  in  the  present  experiment.  The  following  notation 
was  developed  to  describe  the  varying  stimulus  parameters: 

s - <ff . Wj.  f^) 

where , 

£ = Stimulus 
£ = Fundamental  frequency 
W = Waveform 
£ = Formant  frequency. 

Fundamental  frequency  was  either  high  (F^)  or  low  (£| ) , waveform  was  either 
triangular  (W^)  or  square  (W^),  and  formant  frequency  either  high  (f^)  or  low 
(£l ) . Stimuli  were  equated  for  loudness  for  each  listener  using  the  method  of 
constant  stimuli  in  a pilot  study. 

C.  Apparatus 

All  laboratory  events  were  under  digital  computer  control  (PDP-8/E). 

Figure  1 displays  a diagram  of  the  instrumentation.  All  signals  were  played 
continuously  on  a 4-track  stereo  tape  deck  (TEAC  3300)  into  one  of  two  picoreed 
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1 S,  = (Fh,Ws>fh)  S9  =(Fh,W$,fh)  S17  - (Fh.Ws.fh) 

2 s2  = (Fh,Wt,fh)  sio  3 (FrWs,fh)  S18  = (Fh,Ws,f1) 

3 S3  3 (FrWs»fh)  Su  = ( Fh  »WS  » f ) S19  = (Fh,Wt,fh) 

4 S4  = (F1,Wt,fh)  S12  = (F1,Ws,f1)  S20  3 (Fh  -Wt-f-,) 

5 S5  3 (Fh’Ws’V  S^3  = (Fh*Wt’fh)  S2!  = (FrWs,fh) 

6 Sg  = ( Fh  ,Wt  >f  i ) S14  = (F1,Wt,fh)  S22  = (F1,W$,f1) 

7 S7  * (F1,Ws,f1)  Si  5 = (Fh  ,Wt,f1)  S23  = (FrWt,fh) 

8 Ss  = (F-j,W^,f^)  Sjg  = (F-|  ,W^.  ,f  i ) S24  - 
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relays.  The  broadband  white  noise  (20  Hz  - 20  kHz)  was  generated  by  a B & K 
model  1402  Randon  Noise  Generator.  A third  relay  presented  the  noise.  The 
tones  and  interfering  noise  were  presented  over  matched  Telephonies  TDH-49 
headphones  at  83  dB  SPL. 

D.  Procedure 

There  were  two  phases  to  the  experiment.  The  first,  or  training  phase, 
required  the  listener  to  report  whether  a one-second  tone  was  "high"  or  "low" 
in  the  fundamental  frequency  or  formant  frequency  condition,  and  "sharp"  or 
"dull"  in  the  waveform  condition.  Training  on  a particular  feature  occurred 
during  the  first  block  of  each  four-blocK  test  session.  A total  of  300  trials 
were  presented  during  this  phase.  All  listeners  achieved  and  generally  exceeded 
a level  of  75  percent  correct  on  all  features.  After  training,  350  test  trials 
were  presented  per  block  during  the  second,  third,  and  fourth  blocks.  The 
feature  presented  during  training  was  identical  to  the  feature  that  varied 
during  testing.  This  sequence  of  one  training  block  followed  by  three  testing 
blocks  was  repeated  for  each  of  the  three  features  on  separate  days. 

Signals  were  counterbalanced  for  all  features  across  listeners.  For 
example,  consider  fundamental  frequency  as  displayed  in  Table  1.  Observers 
were  presented  stimuli  counterbalanced  for  waveform  and  formant  frequency. 

Two  of  the  listeners  (#1  and  # 2)  were  presented  tones  generated  by  driving  the 
formant  filter  centered  at  940  Hz  by  a 90  Hz  or  140  Hz  square  wave.  Listeners 
3 and  4 were  presented  tones  generated  by  driving  the  formant  filter  centered 
at  600  Hz  by  a 90  Hz  or  140  Hz  square  wave.  Listeners  5 through  8 received 
similar  counterbalancing  with  triangular  wave  signals.  The  waveform  and  formant 
frequency  conditions  were  counterbalanced  in  a similar  manner.  The  sequence  of 
treatment  presentation  was  randomized  over  listeners. 

On  each  trial  a complex  signal  was  presented  for  20  msec  followed,  after 
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a variable  ISI  (0,  40,  80,  160,  250,  350,  500  msec},  by  a 500  msec  white  noise 
burst.  The  listener  reported  whether  the  tone  was  high  or  low  (sharp  or  dull). 
Feedback  was  given  on  all  trials.  Figure  2 presents  a diagram  of  the  sequence 
of  events  for  one  experimental  trial.  In  all  conditions  a single  feature  was 
varied  while  the  other  features  remained  constant.  The  ISI  as  well  as  the 
levels  within  each  dimension  were  presented  randomly  during  each  session,  with 
the  only  constraint  that  an  equal  number  of  trials  occurred  for  all  I S I s and 
levels . 

Each  experimental  block  consisted  of  350  trials  (25  trials/feature  x 7 ISIs 
x 2 levels).  An  average  trial  took  4.5  seconds.  There  were  three  25  minute 
blocks  of  trials  with  a 10-minute  rest  period  between  blocks. 

II.  Results 

Recognition  performance  was  recorded  for  all  listeners  for  each  ISI  and 
feature.  Percent  correct  scores  were  converted  to  levels  of  d‘  with  the  use  of 
the  Elliot  table  presented  in  Swets  (1964).  A discussion  of  the  rationale 
for  applying  the  theory  of  signal  detection  to  a recognition  memory  paradigm 
can  be  found  in  Egan  (1958).  A preliminary  analysis  of  the  performance  data 
revealed  no  overall  listener  bias  to  report  "high"  or  "low,"  but  nonetheless, 
bias-free  d'  scores  were  employed.  Figure  3 presents  d'  levels  for  each  ISI 
and  feature  averaged  across  the  eight  listeners. 

Massaro  has  outlined  a formal  theory  where  the  discriminability  level,  d‘, 
can  be  described  by  an  exponential  function  of  time  as  in  equation  (1): 

d'  = a (1  - e"0t)  (1) 

where , 

a = the  asympotic  level  of  d' 
e = a rate  parameter  reflecting  the  rate  at 
which  the  asymptote  is  approached 


20  60  100  180  270  370  520 


t (msec) 


Figure  3.  Sensitivity  for  waveform,  fundamental  frequency, 
and  formant  frequency  as  a function  of  total 
processing  time,  t.  Solid  lines  are  predicted 
values  given  by  equation  [1], 
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t = the  total  processing  time  of  the  stimulus,  (tone 
duration  + ISI) . 

The  greater  the  level  of  discriminabi 1 i ty , the  larger  the  value  of  a. 
Similarly,  the  greater  the  rate  of  perceptual  processing,  the  larger  the  value 
of  0.  The  solid  curves  in  Figure  3 represent  predicted  values  of  d‘  obtained 
by  fitting  equation  (1)  to  the  observed  data  using  a modified  Levenberg- 
Marquard  algorithm  with  a least  squares  criterion  (cf.  subroutine  ZXSSQ  in  the 
IMSL  statistical  library).  Reasonably  good  fits  were  obtained  for  each  of  the 
three  curves  (r2  = .82,  .91,  and  .93  for  formant,  fundamental  and  waveform, 
respectively) . 

A visual  inspection  of  Figure  3 suggests  that  recognition  improved  with 
increasing  processing  time  for  all  three  features.  This  observation  was  con- 
firmed by  a two-way  analysis  of  variance  with  repeated  measures  on  both  ISI 
and  feature  (Myers,  1967).  Significant  main  effects  were  observed  for  ISI, 
£(6,42)  = 18.98,  £<  .001,  and  Feature,  £(2,14)  = 6.55,  £ < .01.  In  addition, 
a significant  interaction  was  found  between  ISI  and  Feature,  £(12,84)  = 3.37, 
£<  .01,  indicating  that  the  duration  of  the  interval  between  the  signal  pre- 
sentation and  the  masking  white  noise  differentially  affected  classification 
performance  for  three  features.  This  finding  suggests  that  the  time  course  of 
the  feature  extraction  process  involved  in  the  recognition  interference  task 
depends  on  the  auditory  dimension  or  feature  being  analyzed. 

A further  analysis  of  mean  d'  levels  over  sessions  (£  tests)  revealed  no 
significant  difference  between  any  two  sessions.  This  result,  and  the  fact 
that  no  practice  effects  were  found  with  training,  indicates  that  learning 
effects  were  absent  during  the  test  phase  of  the  experiment. 

As  previously  described,  an  elaborate  counterbalancing  procedure  was 
employed  for  each  auditory  feature.  For  instance,  stimuli  varying  on  the 
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waveform  dimension  were  counterbalanced  for  fundamental  frequency  and  formant 
center  frequency.  Consequently,  four  combinations  of  stimuli  varying  on  wave- 
form were  possible,  and  two  listeners  were  presented  with  each  particular 
combination.  To  insure  that  there  were  no  differences  in  d 1 scores  associated 
with  stimuli  over  all  counterbalanced  features,  comparisons  were  made  between 
mean  performance  (collapsed  over  I S I ) of  listeners  in  each  group.  No  signifi- 
cant differences  were  found. 

III.  Discussion 

Two  major  conclusions  are  evident  from  the  present  experiment.  First, 

the  noise  was  effective  in  disrupting  processing  of  the  auditory  image  created 

by  a short  burst  of  a complex  auditory  signal.  As  in  Massaro's  earlier  studies 

\ 

(1970;  1975),  performance  reached  an  asymptotic  level  at  an  ISI  between  160 
and  250  msec.  This  suggests  that  the  duration  of  the  auditory  image  for  complex 
non-speech  sounds  is  approximately  200  msec,  generally  consistent  with  estimates 
from  recognition  interference  studies  employing  both  simple  non-speech  (Massaro, 
1975)  and  speech  (Wolf,  1976)  signals. 

Second,  accurate  identification  of  a complex  signal  depends  critically 
on  its  psychophysical  structure.  Differential  performance  on  the  three  auditory 
features  was  revealed  in  both  the  rate  of  information  processing  (as  indicated 
by  the  different  shaped  performance  by  ISI  functions  for  the  three  features), 
and  in  the  maximum  performance  reached  (as  indicated  by  the  different  asymp- 
totes for  the  three  features).  This  suggests  that  either:  (a)  different 
feature  extraction  processes  are  involved  in  the  analysis  of  the  three  dif- 
ferent features,  or  (b)  the  same  feature  extraction  process  proceeds  at  different 
rates  and  to  different  levels  for  different  features. 

In  order  to  further  explore  the  acoustic  correlates  of  performance  in  the 
present  task,  the  physical  structure  of  the  signals  was  examined  for  possible 
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correlates  with  classification  performance.  Each  of  the  signals  was  subjected 
to  a narrow-band  spectral  analysis  using  a Federal  Scientific  Spectral  Analyzer, 
Model  UA-100.  Figure  4 presents  the  steady-state  line  spectra  for  two  examples 
of  each  of  the  three  features.  The  spectra  have  been  ordered  with  the  most 
easily  di scriminable  pair  on  the  top  to  the  least  easily  discriminable  on  the 
bottom.  Before  progressing  further,  it  should  be  noted  that  the  spectra  presented 
in  Figure  4 were  obtained  from  steady-state  signals  rather  than  from  the  20  msec 
bursts  used  in  the  experiment.  As  Licklider  (1951)  has  pointed  out,  the  spec- 
trum of  a brief-duration  signal  will  be  distorted  or  "smeared"  relative  to  its 
steady-state  spectrum.  Nonetheless,  the  spectra  presented  in  Figure  4 should 
be  useful  in  clarifying  the  psychoacoustic  basis  of  performance  in  the  present 
task.  Instances  where  the  spectral  smearing  is  likely  to  have  influenced  perfor- 
mance will  be  noted  below  where  appropriate. 

As  is  evident  in  Figure  4,  the  principal  difference  between  the  two  wave- 
forms lies  in  the  distribution  of  harmonics.  While  both  the  square  (Figure  4a) 
and  triangular  (Figure  4b)  waves  are  composed  of  only  odd  harmonics,  the  distri- 
bution is  much  more  steeply  sloped  for  the  triangular  wave.  At  the  intensity 
levels  investigated  in  the  present  study,  more  harmonics  would  be  audible  for 
the  square  than  triangular  waves.  The  net  result  is  that  the  partials  present 
in  the  square  wave  influence  more  critical  bands  than  those  present  in  the 
triangular  wave  [below  approximately  1 kHz,  the  critical  bands  are  roughly 
1 50  Hz  about  the  center  frequency  (Scharf,  1971)].  In  the  frequency  condition 
(Figures  4c  and  4d) , the  only  physical  difference  is  in  the  spacing  between 
harmonics.  That  is,  the  harmonics  of  the  90  Hz  fundamental  (Figure  4c)  are 
more  closely  spaced  than  those  of  the  140  Hz  fundamental  (Figure  4d) . 

For  formant  frequency,  the  only  physical  difference  lies  in  the  amplitude 
relations  among  all  the  harmonics.  For  example,  in  Figure  4e,  a 90  Hz  square 
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wave  with  the  formant  centered  at  600  Hz,  the  seventh  harmonic  was  approximately 
4 dB  greater  than  the  fifth  or  ninth.  In  Figure  4f,  a 90  Hz  square  wave  with 
a formant  centered  at  940  Hz,  the  eleventh  harmonic  was  4 dB  greater  than  the 
ninth  or  thirteenth  partial. 

This  analysis  suggests  that  the  number  of  audible  harmonics  ( i . e . , wave- 
form), the  distance  between  the  harmonics  (i.e,,  fundamental),  and  the  amplitude 
relations  among  them  (i.e.,  formant  frequency)  occupy  a respective  hierarchy 
of  subjective  importance  in  the  classification  of  brief-duration  segments  of 
these  signals.  It  is  clear,  however,  that  the  suggested  hierarchy  of  featural 
salience  cannot  be  absolute  or  rigidly  determined  for  all  listening  conditions. 
In  the  Howard  and  Silverman  (1976)  multidimentional  scaling  study,  it  was 
revealed  that  fundamental  frequency  was  more  salient  (i.e.,  accounted  for  more 
of  the  judgment  variance  in  the  scaling  solution)  than  either  waveform  or 
formant  frequency.  Although  the  stimuli  employed  in  this  and  the  present  study 
were  identical  , the  stimulus  durations  were  very  different  (3  seconds  for 
Howard  & Silverman  and  20  msec  for  the  present  study).  The  tone  bursts  used 
in  the  present  study  would  tend  to  preserve  relatively  more  high  than  low 
frequency  information,  thereby  emphasizing  the  subjective  importance  of  the 
harmonics.  For  example,  in  the  extreme  case  of  the  90  Hz  tone,  the  20  msec 
burst  would  contain  fewer  than  2 complete  cycles  at  the  fundamental,  but  would 
contain  more  than  10  cycles  at  the  fifth  harmonic.  In  contrast,  the  Howard 
and  Silverman  study  allowed  the  listener  to  integrate  the  signal  over  a longer 
period  of  time  (3  seconds),  enabling  that  observer  to  stress  the  fundamental -- 
the  frequency  containing  the  greatest  relative  energy. 

Implicit  in  the  above  analysis  is  the  assumption  that  the  three  auditory 
features  (fundamental  frequency,  waveform  and  formant  frequency)  are  revealed 
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or  extracted  from  a complex  sound  by  three  different  feature  extraction  pro- 
cesses. However,  as  pointed  out  above,  an  equally  viable  alternative  is  that 
the  three  features  differ  only  along  the  single  dimension  of  discriminability , 
and  therefore  differentially  influence  a single  feature  extraction  process. 
According  to  the  latter  hypothesis,  the  three  features  examined  in  the  present 
study  lead  to  different  sensitivity  by  ISI  functions  primarily  because  the 
specific  feature  values  investigated  were  not  equated  for  discriminability. 

For  example,  it  is  conceivable  that  the  psychological  processes  responsible 
for  distinguishing  two  square  waves  that  differ  by  10  Hz  are  no  different  from 
those  responsible  for  distinguishing  sounds  having  two  distinctive  formants 
but  identical  fundamentals.  Since  no  previous  recognition  interference  studies 
have  investigated  the  effects  of  discriminability  on  classification  performance, 
the  effects  of  this  variable  and  its  role  in  determining  featural  saliency  re- 
main unclear.  Experiments  2 and  3 are  designed  to  examine  the  effects  of  dis- 
criminability on  recognition  performance  for  two  auditory  features  (formant 
frequency  and  fundamental  frequency,  respectively). 

EXPERIMENT  2 

In  this  experiment,  formant  frequency  is  varied  along  a single  continuum 
of  discriminability  by  changing  the  separation  between  the  center  frequencies 
of  the  formants  in  90  Hz  square  waves.  Listeners  are  required  to  classify 
signals  into  two  categories  on  the  basis  of  formant  frequency  under  three  con- 
ditions. The  three  conditions  investigated  the  original  formant  center  frequency 
difference  of  600/940,  a more  discriminable  difference  of  600/1000,  and  a still 
greater  difference  of  600/1200.  If  recognition  performance  varies  with  dis- 
criminability in  a manner  identical  to  that  observed  in  Experiment  1 (i.e., 
different  asymptotes  and  rates  for  the  three  levels),  then  this  would  suggest 
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that  the  results  of  Experiment  1 could  be  attributed  to  discriminability  dif- 
ferences and  not  to  any  unique  psychological  characteristics  of  the  different 
features.  If,  on  the  other  hand,  recognition  performance  in  the  present  study 
differs  from  that  of  Experiment  1 (e.g.,  asymptotes,  but  not  rates, differ  for 
the  three  levels),  then  this  would  provide  evidence  that  the  three  features 
are  extracted  by  distinct  psychological  processes. 

I,  Method 

A.  Observers 

Two  male  and  two  female  undergraduates,  aged  19-22  years,  served  as  ob- 
servers. No  participant  had  participated  in  the  first  experiment  or  had  any 
known  history  of  hearing  disorders.  Instructions  and  payment  were  identical 
to  those  of  Experiment  1. 

B.  Apparatus 

The  apparatus  was  identical  to  that  of  Experiment  1.  Three  stimulus 
tapes  were  generated  by  passing  a 90  Hz  square  wave  through  the  previously  des- 
cribed formant  filter  centered  at  one  of  the  four  different  frequencies.  One 
tape  was  recorded  with  the  formant  frequency  centered  at  600  Hz  on  one  channel 
and  1000  Hz  on  the  other.  A second  tape  was  similarly  recorded  with  the  formant 
frequencies  centered  at  600  and  1200  Hz.  A formant  tape  recorded  for  the  first 
experiment  (center  frequencies  at  600  and  940  Hz)  was  used  for  the  third  stimulus 
pair.  All  stimuli  were  presented  at  a listening  level  of  83  dB  SPL. 

II.  Results 

Figure  5 presents  the  d'  levels  averaged  over  all  four  listeners.  For  all 


levels  of  formant  discriminability,  recognition  performance  improved  with  in- 
creasing IS  I . These  data  were  analyzed  with  a two-way  repeated  measures  analysis 
of  variance.  As  suggested  by  visual  inspection  of  Figure  5,  main  effects  were 
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Figure  5.  Sensitivity  for  three  levels  of  formant  frequency  discrim- 
anability  as  a function  of  total  processing  time,  t.  Solid 
lines  are  predicted  values  given  by  equation  [1]. 
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observed  for  ISI , £(6,18)  = 14.40,  .01,  and  Discriminability,  £(2,6)  = 

26.72  , £ < .001.  However,  the  ISI  Discriminability  interaction  did  not 
approach  statistical  significance,  £(12,36)  < 1.0.  This  result  suggests 
that  while  higher  asymptotic  performance  occurs  for  sound  pairs  with  more 
widely  separated  formant  frequencies,  the  rate  at  which  this  asymptote  is  ap- 
proached with  increasing  ISI  does  not  differ  for  the  three  discriminability 
levels  examined. 

Curves  of  predicted  d1  values  were  fitted  to  the  data  using  equation  1 
and  the  procedure  described  in  Experiment  1.  The  predicted  values  are  displayed 
as  solid  curves  in  Figure  5.  Extremely  good  fits  were  obtained  for  the  600/940, 
and  600/1000  conditions  (r2  = .99  and  .97,  respectively);  however,  equation  1 
provided  a relatively  poor  description  of  the  data  for  the  600/1200  condition 
(r2  = .62). 

III.  Discussion 

The  results  of  this  experiment  are  clear  in  revealing  different  asymptotic 
performance  levels  for  stimulus  pairs  differing  only  in  the  frequency  separation 
between  formants.  This  finding  indicates  that  asymptotic  performance  is  deter- 
mined largely  by  feature  discriminability  rather  than  by  any  inherent  perceptual 
characteristic  of  the  auditory  feature  itself.  For  example,  asymptotic  perfor- 
mance for  the  600/1200  condition  in  the  present  experiment  (a  = 4.16)  far 
exceeded  that  observed  for  the  waveform  condition  of  Experiment  1 (a  = 3.05). 
This  occurred  despite  the  fact  that  in  Experiment  1 the  formant  condition  pro- 
duced the  lowest  asymptote  (a  = 1.34). 

The  present  findings  also  suggest  that  similar  processing  rates  occur  for 


stimuli  differing  only  in  discriminability  along  a single  auditory  feature. 
Although  this  result  is  not  immediately  obvious  from  a visual  inspection  of 
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Figure  5,  a statistical  analysis  of  the  data  was  clear  in  revealing  no  signi- 
ficant ISI  Discriminability  interaction.  This  finding  contrasts  with  the 
highly  significant  interaction  observed  in  Experiment  1 where  feature  was 
varied. 

EXPERIMENT  3 

Experiment  3 is  designed  to  extend  the  findings  of  Experiment  2 to  a 
second  auditory  feature,  fundamental  frequency.  Observers  were  required  to 
categorize  brief-duration  complex  sounds  as  "high"  or  "low"  on  the  basis  of 
their  fundamental  frequency.  As  in  Experiment  2, three  sound  pairs  of  varying 
discriminability  were  employed  and  each  signal  presentation  was  followed  by  an 
interfering  white  noise  burst.  If  the  effects  of  discriminability  noted  in 
Experiment  2 apply  to  recognition  performance  of  features  other  than  formant 
frequency,  then  similar  effects  should  be  observed  in  the  present  study. 

I.  Method 

A.  Observers 

Two  male  and  two  female  observers  participated  in  the  experiment.  No 
participant  had  served  in  either  of  the  previous  experiments  or  reported  any 
known  history  of  hearing  disorders.  Instructions  and  payments  were  identical 
to  those  of  Experiment  1. 

B.  Apparatus 

The  apparatus  was  identical  to  that  of  Experiment  1.  Stimulus  tapes  were 
generated  by  recording  a square  wave  of  90,  120,  130  or  140  Hz  fundamental 
through  the  formant  filter  centered  at  600  Hz.  Three  test  pairs  of  90/120, 
90/130,  and  90/140  Hz  were  produced  to  obtain  three  levels  of  discriminability. 
The  signals  were  subjectively  equated  for  loudness  before  being  presented  at  a 
comfortable  listening  level  (83  dB  SPL). 
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II.  Results 

Mean  d'  values,  averaged  across  the  four  observers,  are  displayed  in 
Figure  6.  As  in  both  earlier  experiments,  performance  improved  with  increasing 
ISI  up  to  an  asymptote  at  approximately  200  msec,  F(6,18)  = 8.70,  £ < .001. 

In  addition,  as  in  Experiment  2,  higher  asymptotic  performance  was  observed 
for  the  more  widely  separated  signal  pairs,  r(2,6)  = 9.87,  £<  .025,  but  no 
reliable  differences  were  observed  in  the  rate  at  which  performance  approached 
this  asymptote  with  increasing  ISI,  £(12,36)  * 1.43,  £>  .10. 

Predicted  d'  values,  displayed  in  the  solid  curves  of  Figure  6,  were 
fitted  to  the  data  as  described  above.  However,  in  contrast  to  the  earlier 
experiments,  relatively  poor  fits  were  obtained  for  all  three  conditions  (r2  = 
.53,  .80,  and  .62  for  the  90/120,  90/130,  and  90/140  conditions,  respectively). 
Visual  inspection  of  Figure  6 suggests  that  the  worst  fits  are  observed  at  the 
shortest  ISIs.  This  observation  was  confirmed  in  a closer  examination  of  the 
data  which  revealed  substantially  larger  mean  sguared  deviations  from  the  pre- 
dicted values  for  the  first  three  ISIs  (deviations  of  .079,  .071,  and  .170  for 
the  90/120,  90/130,  and  90/140  conditions,  respectively)  than  for  the  last 
four  (deviations  of  .022,  .065,  and  .024,  respectively).  In  short,  it  appears 
that  equation  1 does  not  provide  an  adequate  description  of  the  present  data 
for  short  ISIs. 

III.  Discussion 

In  general,  the  results  of  this  experiment  are  consistent  with  those  re- 
ported for  formant  frequency  in  Experiment  2.  Once  again,  discriminabi 1 i ty 
had  a large  effect  on  asymptotic  performance  with  more  widely  separated  stimulus 
pairs  showing  correspondingly  higher  asymptotes.  In  addition,  asymptotic  per- 
formance levels  for  all  three  discriminations  investigated  in  the  present  study 


t (msec) 


Figure  6.  Sensitivity  for  three  levels  of  fundamental 
frequency  discriminability  as  a function  of 
total  prenessing  time,  t.  Solid  lines  are 
predicted  values  given  by  equation  [1]. 
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(a  = 1.97,  2.51,  and  2.86  for  the  90/120,  90/130,  and  90/140  Hz  conditions, 
respectively)  were  higher  than  that  observed  for  fundamental  frequency  in 
Experiment  1 (a  = 1.92  for  a 90/140  Hz  discrimination) . These  findings 
clearly  indicate  that  asymptotic  performance  depends  more  on  the  discriminability 
of  critical  features  in  a complex  sound  than  on  any  singular  properties  of  the 
feature  itself. 

A second  finding  of  the  present  study,  again  consistent  with  Experiment  2, 
is  that  no  statistically  reliable  difference  was  observed  in  the  effect  of  ISI 
on  discriminability.  As  indicated  above,  this  finding  suggests  that  fundamental 
frequency  is  processed  at  a constant  rate  regardless  of  its  relative  salience 
or  discriminability  in  the  acoustic  environment.  While  this  finding  is  statis- 
tically clear,  processing  rate  must  be  interpreted  with  some  caution  in  the 
present  study.  In  particular,  the  data  presented  in  Figure  6 appear  to  approach 
asymptote  in  a sudden  rather  than  gradual  manner  as  indicated  by  the  relatively 
poor  fit  of  equation  1.  This  finding  is  not  surprising  given  the  low  frequencies 
investigated  in  the  present  study.  Under  optimal  presentation  conditions,  a 20 
msec  burst  would  contain  only  1.8,  2.4,  2.6,  and  2.8  cycles  at  fundamentals  of 
90,  120,  130,  and  140  Hz,  respectively.  The  present  data  indicate  that  a 
minimum  processing  time  of  between  100  and  180  msec  is  required  to  extract  suffi- 
cient information  from  the  auditory  image  to  make  this  discrimination. 

GENERAL  DISCUSSION 

The  present  study  examined  the  sensitivity  of  listeners  to  three  selected 
auditory  features  in  a two-choice,  recognition  interference  task.  Two  major 
findings  are  evident.  First,  asymptotic  recognition  performance  in  this  task 
depends  on  a variety  of  stimulus  as  well  as  subjective  factors.  However,  there 
is  little  evidence  in  the  present  study  to  indicate  that  featural  saliency,  as 
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reflected  by  asymptotic  performance,  is  determined  by  any  unique  perceptual 
property  of  individual  features.  Second,  the  results  suggest  that  different 
auditory  features  are  processed  at  different  rates,  but  that  processing  rate 
remains  invariant  when  discriminabi 1 i ty  is  varied  with  feature  held  constant. 
These  major  findings  are  discussed  in  more  detail  below. 

Asymptotic  Performance 

Listeners  in  the  present  study  were  clearly  able  to  selectively  respond 
to  specific  auditory  features  in  brief-duration,  complex  non-speech  sounds. 
Experiment  1 revealed  substantial  differences  in  the  performance  asymptotes 
for  three  different  auditory  features  (waveform,  fundamental  frequency,  and 
formant  frequency).  A comparison  of  these  findings  with  the  results  of 
Experiments  2 and  3,  suggest  that  at  least  two  stimulus  parameters  are  impor- 
tant in  determining  asymptotic  performance  level:  signal  duration  and  stimulus 
discriminability. 

The  importance  of  signal  duration  became  evident  in  a comparison  of  the 
results  of  Experiment  1 and  the  earlier  scaling  study.  In  Experiment  1,  wave- 
form produced  the  highest  asymptote  with  fundamental  and  formant  yielding 
successively  lower  overall  performance  levels.  On  the  other  hand,  in  the 
scaling  study,  fundamental  frequency  accounted  for  substantially  more  of  the 
total  judgment  variance  than  did  waveform.  Although  the  signals  were  identical 
in  the  two  studies,  two  major  differences  may  be  noted:  (1)  the  responses  re- 
quired were  quite  different,  and  (2)  dramatically  different  signal  durations 
•were  used  in  the  two  studies.  In  Experiment  1,  listeners  categorized  brief 
presentations  (20  msec)  of  individual  sounds,  whereas  in  the  scaling  study, 
listeners  ware  required  to  make  relative  similarity  judgments  of  successively 
presented  three-second  segments  of  two  sounds.  Although  it  is  impossible  to 
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rule  out  the  possibility  that  listeners  use  different  criteria  in  making  classi- 
fication responses  and  similarity  judgments,  it  is  more  likely  that  signal  dura- 
tion was  the  key  factor  underlying  the  different  pattern  of  results  observed 
in  these  studies.  As  was  indicated  above,  the  tone  bursts  used  in  Experiment  1 
would  tend  to  preserve  relatively  more  high  than  low  frequency  information, 
and  hence,  the  subjective  importance  of  waveform  would  be  enhanced  relative  to 
fundamental.  On  the  other  hand,  when  listeners  had  a sufficiently  long  inte- 
gration period,  as  in  the  scaling  study,  the  low-frequency  fundamental  would 
become  more  important  relative  to  waveform. 

The  second  stimulus  factor  influencing  asymptotic  performance  level  is 
signal  discriminability.  Experiments  2 and  3 revealed  higher  asymptotic 
performance  for  signal  pairs  having  larger  physical  differences.  In  both 
experiments  where  discriminability  was  varied,  asymptotic  performance  equalled 
or  exceeded  that  observed  for  the  corresponding  conditions  in  Experiment  1. 

The  implication  of  these  findings  is  that  differences  in  asymptotic  performance 
cannot  be  attributed  to  any  invariant  subjective  property  of  individual  auditory 
features.  Rather,  at  least  two  physical  characteristics  of  the  stimulus,  its 
duration  and  relative  distinctiveness,  have  a major  influence  on  a listener's 
ability  to  selectively  respond  to  individual  features  in  a complex  non-speech 
sound. 

In  addition  to  these  stimulus  factors,  at  least  one  subjective  factor--the 
amount  of  prior  experience  in  classifying  sounds  on  the  basis  of  a particular 
feature--played  a primary  role  in  determining  asymptotic  performance  in  the 
present  study.  When  absolute  performance  levels  are  compared  for  the  corres- 
ponding conditions  across  the  present  experiments,  it  becomes  evident  that 
Experiments  2 and  3 produced  substantially  higher  overall  levels  than  did 
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Experiment  1.  For  example,  in  Experiment  2,  where  formant  discriminability  was 
varied,  the  asymptotes  for  all  three  discriminability  levels  (a  = 2.06,  3.33, 
and  4.16)  were  higher  than  that  observed  for  the  formant  condition  in  Experiment 
1 [a  = 1.34).  Similarly,  the  three  asymptotes  observed  when  fundamental  separa- 
tion was  varied  in  Experiment  3 (a  = 1.97,  2.51,  and  2.86)  all  exceeded  the  asymp- 
tote for  the  comparable  condition  in  Experiment  1 (a  = 1.92). 

Since  this  discrepancy  occurred  even  for  identical  signal  pairs,  the  dif- 
ferences cannot  be  attributed  to  any  physical  characteristics  of  the  auditory 
waveforms.  Similarly,  differences  in  the  absolute  amount  of  practice  cannot 
account  for  these  findings  since  listeners  in  all  three  experiments  received 
the  same  number  of  trials.  It  should  be  noted,  however,  that  while  all  listeners 
had  an  identical  amount  of  overall  experience  in  the  task,  listeners  in  Experi- 
ments 2 and  3 always  classified  signals  on  the  basis  of  a specific  feature, 
whereas  the  listeners  in  Experiment  1 classified  stimuli  on  the  basis  of  three 
different  features  on  separate  days.  Hence,  it  appears  that  experience  with  a 
specific  feature  leads  to  substantially  better  performance  than  does  experience 
with  different  features. 

Two  different  psychological  mechanisms  could  underlie  this  effect.  First, 
observers  may  simply  develop  an  improved  ability  to  selectively  focus  their 
feature  extraction  efforts  on  a particular  relevant  feature,  and  ignore  other 
irrelevant  features.  Similar  improvements  in  selective  attention  have  been 
reported  for  experienced  listeners  in  discriminating  temporally  varying  tonal 
patterns  (Watson,  Wroton,  Kelly,  & Benbassat,  1975).  It  should  be  noted,  how- 
ever, that  if  an  improvement  in  attentional  ability  were  occurring,  the  relative 
lack  of  improvement  observed  in  Experiment  1 would  suggest  that  such  improvement 
would  be  quite  feature  specific. 
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Second,  it  is  possible  that  the  observed  differences  reflect  an  increased 
absolute  sensitivity  to  the  practiced  feature.  In  other  words,  extended  prac- 
tice may  improve  the  efficiency  or  accuracy  of  the  feature  extraction  process 
itself.  Although  the  present  study  was  not  designed  to  distinguish  between 
these  alternatives,  it  is  clear  that  the  performance  levels  observed  in  the 
present  experiments--especial ly  Experiment  l--do  not  reflect  any  absolute  limit 
on  a listener's  ability  to  recognize  specific  features  in  brief-duration  complex 
sounds.  It  remains  for  future  research  to  empirically  clarify  the  two  possible 
psychological  mechanisms  outlined  above. 

Rate  of  Processing 

One  of  the  most  obvious  findings  of  the  present  study  was  that  performance 
improved  with  increasing  I SI . The  longer  the  period  between  signal  presentation 
and  the  interfering  noise  burst,  the  higher  the  performance  level  up  to  some 
optimal  asymptotic  value.  As  indicated  above,  this  finding  is  consistent  with 
Massaro's  (1975)  earlier  data,  and  with  his  argument  that  brief -duration  sounds 
are  retained  in  a short-lived  auditory  memory  for  further  processing.  In 
Massaro's  thinking,  the  time  between  the  signal  offset  and  noise  onset  is  con- 
sumed by  "primary  recognition  processes"  which  transform  the  relatively  un- 
processed auditory  image  into  a categorical  ( i . e . , recognized)  representation. 
Since  the  interfering  noise  is  thought  to  disrupt  this  and  terminate  this  pro- 
cess, the  rate  of  recognition  processing  is  revealed  in  the  rate  at  which  per- 
formance improves  with  increasing  IS  I . 

In  Experiment  1 of  the  present  study,  ISI  was  observed  to  differentially 
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influence  performance  for  the  three  auditory  features.  In  contrast,  when 
feature  was  held  constant  while  discriminability  was  varied  in  Experiments  2 
and  3,  no  statistically  reliable  difference  was  noted  in  the  effects  of  ISI  on 
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performance.  Since  post  asymptotic  data  in  the  three  experiments  were 
relatively  stable,  these  results  suggest  that  different  auditory  features  are 
processed  at  different  rates,  but  the  rate  of  processing  for  a particular 
feature  remains  invariant  when  factors  influencing  asymptotic  performance  are 
varied.  This  would  be  expected  if  different  feature  extraction  processes 
were  applied  in  the  analysis  of  the  three  features  examined  in  the  present 
study.  Furthermore,  this  interpretation  is  also  consistent  with  the  previously 
discussed  finding  that  extended  practice  selectively  improves  performance  for 
particular  features.  However,  despite  its  apparent  theoretical  consistency, 
this  finding  must  be  interpreted  with  caution  for  at  least  two  reasons.  First, 
the  finding  is  based  on  a comparison  of  data  across  experiments  using  different 
listeners  and  different  numbers  of  participants.  Second,  a comparison  of 
processing  rates  across  conditions  assumes  that  performance  improves  in  an 
orderly  and  systematic  fashion  as  ISI  is  increased.  In  the  present  data, 
relatively  large  variance  was  observed  at  the  short  ISIs  (i.e.,  0,  40,  80  msec), 
and  as  indicated  above,  the  data  of  Experiment  3 appear  to  show  a sudden  rather  than 
gradual  improvement  with  increasing  ISI. 

Implications 

The  present  findings  clearly  indicate  that  the  stimulus  environment  plays 
an  important  role  in  determining  the  subjective  importance  of  auditory  features. 

They  suggest  that  selective  changes  in  the  auditory  environment  (e.g.,  through 
signal  preprocessing)  could  significantly  enhance  a listener's  ability  to  recog- 
nize even  relatively  subtle  acoustic  cues  (e.g.,  formant  frequency).  The 
findings  also  suggest  that  practiced  listeners  may  have  an  improved  ability  to 
selectively  focus  their  attention  on  specific  auditory  cues  in  a complex  aural 
display.  It  appears  that  extended  practice  with  important,  but  initially 
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unsalient  cues,  would  lead  to  a greater  performance  increment  than  would  an 
equivalent  amount  of  practice  with  a variety  of  cues.  Overall,  the  backward 
recognition  interference  paradigm  was  shown  to  be  a valuable  tool  for  examining 
the  effects  of  various  signal  parameters  on  an  observer's  ability  to  extract 
information  from  complex  sounds.  In  addition,  its  potential  for  examining  the 
time  course  of  auditory  information  processing  may  prove  invaluable. 
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